NSF PAR Search | NSF Public Access Repository

CorrGAN: Simultaneous Learning of Speech Enhancement and Perceptual Quality Loss Functions

https://doi.org/10.1109/ICASSP49660.2025.10887633

Zadorozhnyy, Vasily; Amizadeh, Saeed; Ye, Qiang; Koishida, Kazuhito (April 2025, IEEE)

Deep-learning models have allowed effective end-to-end SE systems in the Speech Enhancement (SE) field. Most of these methods are trained using a fixed reconstruction loss in a supervised setting. Often these losses do not perfectly represent the desired perceptual quality metrics, resulting in sub-optimal performance. Recently, there have been efforts to learn the behavior of those metrics directly via neural nets for training SE models. However, an accurate estimation of the true metric function introduces statistical complexity for training because it attempts to capture the exact value of the metric. We propose an adversarial training strategy based on statistical correlation that avoids the complexity of estimating the SE metric while learning to mimic its overall behavior. We call this framework CorrGAN and show its significant improvement over standard losses of the SOTA baselines and achieve SOTA performance on the VoiceBank+DEMAND dataset.

Free, publicly-accessible full text available April 6, 2026

Search for: All records